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1 Supporting valid-time indeterminacy 87% 
Curtis E. Dyreson , Richard T. Snodgrass 
ACM Transactions on Database Systems (TODS) March 1998 
Volume 23 Issue 1 

In valid-time indeterminacy it is known that an event stored in a database did in fact occur, but 
it is not known exactly when. In this paper we extend the SQL data model and query language 
to support valid-time indeterminacy. We represent the occurrence time of an event with a set 
of possible instants, delimiting when the event might have occurred, and a probability 
distribution over that set. We also describe query language constructs to retrieve informat ... 



2 Meaningful term extraction and discriminative term selection in text categorization via 82% 
unknown- word methodology 
Yu-Sheng Lai , Chung-Hsien Wu 

ACM Transactions on Asian Language Information Processing (TALIP) March 2002 
Volume 1 Issue 1 

In this article, an approach based on unknown words is proposed for meaningful term 
extraction and discriminative term selection in text categorization. For meaningful term 
extraction, a phrase-like unit (PLU)-based likelihood ratio is proposed to estimate the 
likelihood that a word sequence is an unknown word. On the other hand, a discriminative 
measure is proposed for term selection and is combined with the PLU-based likelihood ratio 
to determine the text category. We conducted several experim ... 
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Yuan-Chi Chang , Chung-Sheng Li , John R. Smith 

Proceedings of the 4th ACM conference on Electronic commerce June 2003 
Economics research has long recognized that bundling enables savings in production and 
transaction costs, promotes complementary among the bundle components and sorts 
consumers according to their valuations. Sellers employ market analysis and intelligence to 
extract the most surplus. In the age of electronic commerce with low product information 
access cost, buyers can take advantage of the benefits of bundling by performing dynamic 
composition of goods from multiple companies offering heterogen ... 

Garbage collection in object-oriented databases using transactional cyclic reference counting 80% 

P. Roy , S. Seshadri , A. Silberschatz , S. Sudarshan , S. Ashwin 

The VLDB Journal — The International Journal on Very Large Data Bases 

August 1998 

Volume 7 Issue 3 

Garbage collection is important in object-oriented databases to free the programmer from 
explicitly deallocating memory. In this paper, we present a garbage collection algorithm, 
called Transactional Cyclic Reference Counting (TCRC), for object-oriented databases. The 
algorithm is based on a variant of a reference-counting algorithm proposed for functional 
programming languages The algorithm keeps track of auxiliary reference count information to 
detect and collect cyclic garbage. The algorithm ... 

Process variation: Explicit computation of performance as a function of process variation 80% 
Lou Scheffer 

Proceedings of the 8th ACM/IEEE international workshop on Timing issues in the 
specification and synthesis of digital systems December 2002 

Each manufactured chip is a little bit different, and designers want as many as possible of 
these chips to work. Process variation is a function of many variables, as the width, thickness, 
and inter-layer thickness can vary independently for each layer on a chip, as can temperature 
and voltage. Currently designers cope with this by picking a few subsets of these conditions, 
called process corners, and analyzing at these conditions. However, it's easy to show this 
approach is both too conservativ ... 

Computing curricula 2001 80% 
Journal on Educational Resources in Computing (JERIC) September 2001 

Software process validation: quantitatively measuring the correspondence of a process to a 80% 
model 

Jonathan E. Cook , Alexander L. Wolf 

ACM Transactions on Software Engineering and Methodology (TOSEM) April 1999 
Volume 8 Issue 2 

To a great extent, the usefulness of a formal model of a software process lies in its ability to 
accurately predict the behavior of the executing process. Similarly, the usefulness of an 
executing process lies largely in its ability to fulfill the requirements embodied in a formal 
model of the process. When process models and process executions diverge, something 
significant is happening. We have developed techniques for uncovering and measuring the 
discrepancies between models and executio ... 
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8 Classification and regression: money *can* grow on trees 80% 
Johannes Gehrke , Wie-Yin Loh , Raghu Ramakrishnan 

Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge 
discovery and data mining August 1999 

With over 800 million pages covering most areas of human endeavor, the World-wide Web is 
a fertile ground for data mining research to make a difference to the effectiveness of 
information search. Today, Web surfers access the Web through two dominant interfaces 
clicking on hyperlinks and searching via keyword queries This process is often tentative and 
unsatisfactory Better support is needed for expressing one ! s information need and dealing with 
a search result in more structured ways than ... 



9 An overview of query optimization in relational systems 80% 
Surajit Chaudhuri 

Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on 
Principles of database systems May 1998 



10 Structured documents: Searching XML documents via XML fragments 77% 
Q) David Carmel , Yoelle S. Maarek , Matan Mandelbrod , Yosi Mass , Aya Soffer 

Proceedings of the 26th annual international ACM SIGIR conference on Research and 

development in informaion retrieval July 2003 

Most of the work on XML query and search has stemmed from the publishing and database 
communities, mostly for the needs of business applications. Recently, the Information 
Retrieval community began investigating the XML search issue to answer information 
discovery needs. Following this trend, we present here an approach where information needs 
can be expressed in an approximate manner as pieces of XML documents or "XML 
fragments" of the same nature as the documents that are being searched. We pr ... 



11 Reconstructing sets from interpoint distances (extended abstract) 77% 

Ijj) Steven S. Skiena , Warren D. Smith , Paul Lemke 

Proceedings of the sixth annual symposium on Computational geometry May 1990 
We consider the problem of determining which point sets in some given space realise a given 
distance multiset. Special cases include the “turnpike problem” where the 
points lie on a line, and the “beltway problem” where the points lie on a loop. 
Of interest is the algorithmic problem of determining such point sets for a given collection of 
distances and the combinatorial problem of finding bounds on the maximum number of 
different solutions. These problems find appli ... 



12 Rendering CSG models with a ZZ-buffer 77% 
Qj David Salesin , Jorge Stolfi 

ACM SIGGRAPH Computer Graphics , Proceedings of the 17th annual conference on 

Computer graphics and interactive techniques September 1990 

Volume 24 Issue 4 



13 Complex relationships and knowledge discovery support in the InfoOuilt system 77% 
A. Sheth , S. Thacker , S. Patel 

The VLDB Journal — The International Journal on Very Large Data Bases May 

2003 
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Volume 12 Issue 1 

Support for semantic content is becoming more common in Web-accessible information 
systems. We see this support emerging with the use of ontologies and machine-readable, 
annotated documents. The practice of domain modeling coupled with the extraction of 
domain-specific, contextually relevant metadata also supports the use of semantics. These 
advancements enable knowledge discovery approaches that define complex relationships 
between data that is autonomously collected and managed. The InfoQuilt ... 



14 Information retrieval 2: Text joins in an RDBMS for web data integration 77% 
Luis Gravano , Panagiotis G. Ipeirotis , Nick Koudas , Divesh Srivastava 
Proceedings of the twelfth international conference on World Wide Web May 2003 
The integration of data produced and collected across autonomous, heterogeneous web 
services is an increasingly important and challenging problem. Due to the lack of global 
identifiers, the same entity (e.g., a product) might have different textual representations across 
databases. Textual data is also often noisy because of transcription errors, incomplete 
information, and lack of standard formats. A fundamental task during data integration is 
matching of strings that refer to the same entity. ... 



15 Frequent patterns II: Mining frequent item sets by opportunistic projection 77% 
Junqiang Liu , Yunhe Pan , Ke Wang , Jiawei Han 

Proceedings of the eighth ACM SIGKDD international conference on Knowledge 
discovery and data mining July 2002 

In this paper, we present a novel algorithm Opportune Project for mining complete set of 
frequent item sets by projecting databases to grow a frequent item set tree. Our algorithm is 
fundamentally different from those proposed in the past in that it opportunistically chooses 
between two different structures, array-based or tree-based, to represent projected transaction 
subsets, and heuristically decides to build unfiltered pseudo projection or to make a filtered 
copy according to features of the ... 



16 Streams and time series: On the need for time series data mining benchmarks: a survey and 77% 
Qj empirical demonstration 

Eamonn Keogh , Shruti Kasetty 

Proceedings of the eighth ACM SIGKDD international conference on Knowledge 
discovery and data mining July 2002 

In the last decade there has been an explosion of interest in mining time series data. Literally 
hundreds of papers have introduced new algorithms to index, classify, cluster and segment 
time series. In this work we make the following claim. Much of this work has very little utility 
because the contribution made (speed in the case of indexing, accuracy in the case of 
classification and clustering, model accuracy in the case of segmentation) offer an amount of 
"improvement" that would have been c ... 



17 Fast image retrieval using color-spatial information 77% 
Q] Beng Chin Ooi , Kian-Lee Tan , Tat Seng Chua , Wynne Hsu 

The VLDB Journal — The International Journal on Very Large Data Bases May 

1998 

Volume 7 Issue 2 

In this paper, we present an image retrieval system that employs both the color and spatial 
information of images to facilitate the retrieval process. The basic unit used in our technique is 
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a single-colored cluster, which bounds a homogeneous region of that color in an image. Two 
clusters from two images are similar if they are of the same color and overlap in the image 
space. The number of clusters that can be extracted from an image can be very large, and it 
affects the accuracy of ret ... 

18 Whole-genome comparative annotation and regulatory motif discovery in multiple yeast 77% 
species 

Manolis Kamvysselis , Nick Patterso , Bruce Birren , Bonnie Berger , Eric Lander 
Proceedings of the seventh annual international conference on Computational molecular 
biology April 2003 

In [13] we reported the genome sequences of S. paradoxus, S. mikatae and S. bay anus and 
compared these three yeast species to their close relative, S. cerevisiae. Genome-wide 
comparative analysis allowed the identification of functionally important sequences, both 
coding and non-coding. In this companion paper we describe the mathematical and 
algorithmic results underpinning the analysis of these genomes. We developed statistical 
methods for the systematic de-no vo identification of... 

19 Exploiting hierarchical domain structure to compute similarity 77% 
Qj Prasanna Ganesan , Hector Garcia-Molina , Jennifer Widom 

ACM Transactions on Information Systems (TOIS) January 2003 
Volume 21 Issue 1 

The notion of similarity between objects finds use in many contexts, for example, in search 
engines, collaborative filtering, and clustering. Objects being compared often are modeled as 
sets, with their similarity traditionally determined based on set intersection. Intersection-based 
measures do not accurately capture similarity in certain domains, such as when the data is 
sparse or when there are known relationships between items within sets. We propose new 
measures that exploit a hierarchical ... 

20 What have we learnt from using real parallel machines to solve real problems? 77% 
0) G. C. Fox 

Proceedings of the third conference on Hypercube concurrent computers and 
applications - Volume 2 January 1989 

We briefly review some key scientific and parallel processing issues in a selection of some 84 
existing applications of parallel machines. We include the MIMD hypercube transputer array, 
BBN Butterfly, and the SIMD ICL DAP, Goodyear MPP and Connection Machine from 
Thinking Machines. We use a space-time analogy to classify problems and show how a 
division into synchronous, loosely synchronous and asynchronous problems is helpful. This 
classifies problems into those suitable for SIMD or MIMD ... 
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